E3: an Elastic Execution Engine for Scalable Data Processing
نویسندگان
چکیده
With the unprecedented growth of data generated by mankind nowadays, it has become critical to develop efficient techniques for processing these massive data sets. To tackle such challenges, analytical data processing systems must be extremely efficient, scalable, and flexible as well as economically effective. Recently, Hadoop, an open-source implementation of MapReduce, has gained interests as a promising big data processing system. Although Hadoop offers the desired flexibility and scalability, its performance has been noted to be suboptimal when it is used to process complex analytical tasks. This paper presents E3, an elastic and efficient execution engine for scalable data processing. E3 adopts a “middle” approach between MapReduce and Dryad in that E3 has a simpler communication model than Dryad yet it can support multi-stages job better than MapReduce. E3 avoids reprocessing intermediate results by adopting a stage-based evaluation strategy and collocating data and user-defined (map or reduce) functions into independent processing units for parallel execution. Furthermore, E3 supports block-level indexes, and built-in functions for specifying and optimizing data processing flows. Benchmarking on an in-house cluster shows that E3 achieves significantly better performance than Hadoop, or put it another way, building an elastically scalable and efficient data processing system is possible.
منابع مشابه
Using Tcl to Rapidly Develop a Scalable Engine for Processing Dynamic Application Logic
At Healtheon, we used Tcl to rapidly develop a scalable, high performance rule engine for processing dynamic application logic. The nature of our application requirements, plus the challenge of delivering robust software in a timely manner made Tcl an optimal overall choice in our deployment. We were able to improve rule processing performance by careful language construction and support for co...
متن کاملElastic and Scalable Processing of Linked Stream Data in the Cloud
Linked Stream Data extends the Linked Data paradigm to dynamic data sources. It enables the integration and joint processing of heterogeneous stream data with quasi-static data from the Linked Data Cloud in near-real-time. Several Linked Stream Data processing engines exist but their scalability still needs to be in improved in terms of (static and dynamic) data sizes, number of concurrent quer...
متن کاملElastic Processing of Analytical Query Workloads on IaaS Clouds
Many modern applications require the evaluation of analytical queries on large amounts of data. Such queries entail joins and heavy aggregations that often include user-defined functions (UDF s). The most efficient way to process these specific type of queries is using tree execution plans. In this work, we develop an engine for analytical query processing and a suite of specialized techniques ...
متن کاملIvanova Scalable Scientific Stream Query Processing
Ivanova, M. 2005. Scalable Scientific Stream Query Processing. Acta Universitatis Upsaliensis. Uppsala Dissertations from the Faculty of Science and Technology 66. 137 pp. Uppsala. ISBN 91-554-6351-7 Scientific applications require processing of high-volume on-line streams of numerical data from instruments and simulations. In order to extract information and detect interesting patterns in thes...
متن کاملDistributed Query Processing on the Cloud: the Optique Point of View (Short Paper)
The Optique European project 3 [6] aims at providing an end-to-end solution for scalable access to Big Data integration, where end users will formulate queries based on a familiar conceptualization of the underlying domain. From the users’ queries the Optique platform will automatically generate appropriate queries over the underlying integrated data, optimize and execute them on the Cloud. In ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JIP
دوره 20 شماره
صفحات -
تاریخ انتشار 2012